131 research outputs found

    Semantic annotation of multilingual learning objects based on a domain ontology

    Get PDF
    One of the important tasks in the use of learning resources in e-learning is the necessity to annotate learning objects with appropriate metadata. However, annotating resources by hand is time consuming and difficult. Here we explore the problem of automatic extraction of metadata for description of learning resources. First, theoretical constraints for gathering certain types of metadata important for e-learning systems are discussed. Our approach to annotation is then outlined. This is based on a domain ontology, which allows us to annotate learning resources in a language independent way.We are motivated by the fact that the leading providers of learning content in various domains are often spread across countries speaking different languages. As a result, cross-language annotation can facilitate accessibility, sharing and reuse of learning resources

    Information Extraction from Biomedical Texts

    Get PDF
    V poslední době bylo vynaloženo velké úsilí k tomu, aby byly biomedicínské znalosti, typicky uložené v podobě vědeckých článků, snadněji přístupné a bylo možné je efektivně sdílet. Ve skutečnosti ale nestrukturovaná podstata těchto textů způsobuje velké obtíže při použití technik pro získávání a vyvozování znalostí. Anotování entit nesoucích jistou sémantickou informaci v textu je prvním krokem k vytvoření znalosti analyzovatelné počítačem. V této práci nejdříve studujeme metody pro automatickou extrakci informací z textů přirozeného jazyka. Dále zhodnotíme hlavní výhody a nevýhody současných systémů pro extrakci informací a na základě těchto znalostí se rozhodneme přijmout přístup strojového učení pro automatické získávání exktrakčních vzorů při našich experimentech. Bohužel, techniky strojového učení často vyžadují obrovské množství trénovacích dat, která může být velmi pracné získat. Abychom dokázali čelit tomuto nepříjemnému problému, prozkoumáme koncept tzv. bootstrapping techniky. Nakonec ukážeme, že během našich experimentů metody strojového učení pracovaly dostatečně dobře a dokonce podstatně lépe než základní metody. Navíc v úloze využívající techniky bootstrapping se podařilo významně snížit množství dat potřebných pro trénování extrakčního systému.Recently, there has been much effort in making biomedical knowledge, typically stored in scientific articles, more accessible and interoperable. As a matter of fact, the unstructured nature of such texts makes it difficult to apply  knowledge discovery and inference techniques. Annotating information units with semantic information in these texts is the first step to make the knowledge machine-analyzable.  In this work, we first study methods for automatic information extraction from natural language text. Then we discuss the main benefits and disadvantages of the state-of-art information extraction systems and, as a result of this, we adopt a machine learning approach to automatically learn extraction patterns in our experiments. Unfortunately, machine learning techniques often require a huge amount of training data, which can be sometimes laborious to gather. In order to face up to this tedious problem, we investigate the concept of weakly supervised or bootstrapping techniques. Finally, we show in our experiments that our machine learning methods performed reasonably well and significantly better than the baseline. Moreover, in the weakly supervised learning task we were able to substantially bring down the amount of labeled data needed for training of the extraction system.

    My repository is being aggregated: a blessing or a curse?

    Get PDF
    Usage statistics are frequently used by repositories to justify their value to the management who decide about the funding to support the repository infrastructure. Another reason for collecting usage statistics at repositories is the increased use of webometrics in the process of assessing the impact of publications and researchers. Consequently, one of the worries repositories sometimes have about their content being aggregated is that they feel aggregations have a detrimental effect on the accuracy of statistics they collect. They believe that this potential decrease in reported usage can negatively influence the funding provided by their own institutions. This raises the fundamental question of whether repositories should allow aggregators to harvest their metadata and content. In this paper, we discuss the benefits of allowing content aggregations harvest repository content and investigate how to overcome the drawbacks

    Extraction of semantic relations from texts

    Get PDF
    In recent years the amount of unstructured data stored on the Internet and other digital sources has increased significantly. These data contain often valuable, but hardly retrievable information. The term unstructured data refers mainly to data that does not have a data structure. As a result of this, the unstructured data is not easily readable by machines. In this work, we present a simple method for automatic extraction of semantic relations that can be used to precisely locate valuable pieces of information

    From open access metadata to open access content: two principles for increased visibility of open access content

    Get PDF
    An essential goal of the open access (OA) movement is free availability of research outputs on the Internet. One of the recommended ways to achieve this is through open access repositories (BOAI, 2002). Given the growing number of repositories and the significant proportion of research outputs already available as OA (Laakso & Bjork, 2012), it might come as a surprise that OA content is not necessarily easily discoverable on the Internet (Morrisson, 2012; Konkiel, 2012), more precisely, it is available, but often difficult to find. If OA content in repositories cannot be discovered, there is little incentive to make it available on the Internet in the first place. Therefore, not trying hard enough to increase the visibility of OA content would be a lost opportunity for achieving the main OA goals, including also the reuse potential of OA content. In this paper, we build on our experience in finding and aggregating open access content (not just metadata) from repositories, discussing the main issues and summarizing the lessons learned into two principles that, if adopted, will dramatically increase the discoverability of OA content on the Internet and will improve the possibilities of OA content reuse

    Using Explicit Semantic Analysis for Cross-Lingual Link Discovery

    Get PDF
    This paper explores how to automatically generate cross language links between resources in large document collections. The paper presents new methods for Cross Lingual Link Discovery(CLLD) based on Explicit Semantic Analysis (ESA). The methods are applicable to any multilingual document collection. In this report, we present their comparative study on the Wikipedia corpus and provide new insights into the evaluation of link discovery systems. In particular, we measure the agreement of human annotators in linking articles in different language versions of Wikipedia, and compare it to the results achieved by the presented methods
    corecore